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Abstract. Consider the problem when Xi,X2, ■ ■ ■ ,X n are distributed on a circle following 
an unknown distribution F on S 1 . In this article we have consider the absolute general set- 
up where the density can have local features such as discontinuities and edges. Furthermore, 
there can be outlying data which can follow some discrete distributions. The traditional Kernel 
Density Estimation methods fail to identify such local features in the data. Here we device 
a non-parametric density estimate on S 1 , by the use of a novel technique which we term as 
Fourier Spline. We have also tried to identify and incorporate local features such as support, 
discontinuity or edges in the final density estimate. Several new results are proved in this re- 
gard. Simulation studies have also been performed to see how our methodology works. Finally 
a real life example is also shown. 
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1 Introduction 

Let {X ,A) be a metric space with a Borel a-field and P be a probability distribution on 
(X,A). Further we assume that this space has an inherent topological structure. Let us say 
we have samples Xi, X%, . . . , X n i.i.d from P. Let the us denote the empirical distribution 
as P n , where, 



Note that the empiricals may be given in terms of histogram data as well, i.e. we can have 
data from disjoint partitions A\,A2, ... on the sample space, with uniform Haar measure 
A. In such a situation the empirical distribution is given by 



where rn denotes the number of points in Ai for i = 1 . . . k. 

Assuming that the data is from some unknown density f(x) a natural component of ex- 
ploratory data analysis is to estimate the function /(•). Any density estimator is a descriptor 
of the population distribution, and when it is known that the population distribution is ab- 
solutely continuous with respect to some standard invariant measure, the Radon-Nykodym 
derivative is the density. Now, the major question is what is the central problem in using 
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empirical distribution for density estimation? Note that P(f) = J fdP n is unbiased and 

Var{P{f)) goes to as 1/n. However, in case we assume that P has a derivative p with 
respect to the Haar measure A on (X,A) (in the sense of the Radon- Nykodym derivative) 
the empirical estimate does not work. For example, it is a known that in M, cou ^^ b ) does 
not work. 

There are few different approaches in practice which are majorly used, viz., Kernel 
density estimation, Spline smoothing and Orthogonal Series. 



1.1 Kernel Density Estimation 

Kernel density estimation (KDE) is a non-parametric way to estimate the probability density 
function of a random variable. Kernel density estimation is a fundamental data smoothing 
problem where inferences about the population are made, based on a finite data sample. The 
solution to the smoothing problem lies in the space of all continuously twice differentiable 
functions, with square integrable second derivative. For KDE the Reisz representation is 
essentially used. P = Q if J 4>dP = J <pdQ for all bounded continuous functions (j). Let 
Kh(x — •), x E X denote the family of bounded continuous functions for which the estimate 
is unbiased, h being a smoothing parameter, i.e. 

r i n 

P(K h (x -■))= / K h (x - y)dP n = P n (K h (x -■)) = - £ K h (x - X { ) (3) 
J n i=i 

Kh satisfies for bounded continuous functions, 



f{x) = lim / K h (x - y)f(y)d\(y) = 5 x (f) (4) 

Now dv = Kh(x — y)dX(y) can be thought of as an approximation to the degenerate measure 
8 X when h is small. Note the v « A with Kh(x — •) as its Radon-Nykodym derivative. Thus 
(3) defines a density estimate. 

1 n 

f(x ] h) = -Y,K h (x-X i ) (5) 
i=l 

When X = R, we have the following results from Silverman (1986) |28| , 

h 2 f'°° 

Bias(/(x)) « -f"{x) j z 2 K(z) dz (6) 
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Var(/>)) « \f{x) I K\z)dz (7) 

h 4 / f°° \ 2 1 f'°° 

MSE(f(x))^-f"(x) 2 ( z 2 K(z)dz\ +—f{x) j K 2 (z)dz (8) 



oo / J —oo 



Integrating with respect to x, we get 



MISE(/) « \h^k 2 M) + -Lj 2 (9) 



where K h (x) = K(x/h)/h, k 2 = f z 2 K(z)dz, p(f) = f f"(x) 2 dx and j 2 = f K 2 (z)dz. 
Based on this the optimal value of the bandwidth is derived as 
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Thus, the minimal MISE can be shown to be 
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1.2 Spline Smoothing 

1.2.1 Classical Smoothing Spline Background : 

The usual smoothing spline problem [30] can be considered as a minimization problem. 
Let (xi,Yi); x% < x 2 < ■ . . < x n ,i £ Z be a sequence of observations, modelled by the 
relation Y\ = \i{xi). The smoothing spline estimate fi of the function [i is defined to be the 
minimizer [13] (over the class of twice differentiable functions) of 



Note that 

— A > is a smoothing parameter, controlling the trade-off between fidelity to the data 
and roughness of the function estimate. 

— The integral is evaluated over the range of the X{. 

— As A — > (no smoothing), the smoothing spline converges to the interpolating spline. 

— As A — > oo (infinite smoothing), the roughness penalty becomes paramount and the 
estimate converges to a linear least squares estimate. 

1.2.2 Typical Solution to the Smoothing Problem 

It is useful to think of fitting a smoothing spline in two steps: 

— First, derive the values £i(xi); i = 1, . . . , n. 

— From these values, derive fi{x) for all x. 

Now, consider the second step first. 

Given the vector rh = (fl(xi), . . . , fi{x n )) T of fitted values, the sum-of-squares part of 
the spline criterion is fixed. It remains only to minimize f fi"(x) 2 dx, and the minimizer is 
a natural cubic spline that interpolates the points (xi,p,(xi)). This interpolating spline is a 
linear operator, and can be written in the form 



where fi(x) are a set of spline basis functions. As a result, the roughness penalty has the 
form 
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where the elements of A are J f" (x)f'-(x)dx. The basis functions, and hence the matrix A, 
depend on the configuration of the predictor variables Xi, but not on the responses Yi or rh. 
Now going back to the first step, the penalized sum-of-squares can be written as 

\\Y - m\\ 2 + Xm T Am, (15) 

where Y = (Yi, . . . , Y n ) T . Minimizing over rh gives 

rh = (I + \A)- l Y. (16) 



1.3 Orthogonal Series 

It helps in approximations in a suitable Hilbert space % of functions containing smooth 
functions. However, since the original probability distribution is unknown, the choice of T~L 
becomes arbitrary. Different techniques are needed for different spaces, for eg. in Sobolev 
space [GIVE REF], we take an orthonormal basis {u m } me z of %. 



P ( U m j — J t) ( V>r. 
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Note that here using feature transformation we can transform the probability into the 
space of sequences, i.e. P n — > (P„(u m ) : m G Z). Although it is true that Y P{. u m)'Um 
is convergent in H, the formal Fourier transformation, ^P n (u m )u m does not converge in 
%. Mimicking the KDE approach, the smoothing here requires the multiplication of the 
empirical coefficients by a smoothing sequence {c m } such that YI tfn\^rn{urn)\ 2 < oo. 
Now P(4>) with (ft = Yi u m, <ft) u m can be written as 

P(<ft) = 1^ = 1 (ft^dX = ^<u m ,0> J uJ^dX (18) 

Thus a natural estimate of P{(ft) is then 

p(<t>) = E< 

(19) 

Now by the Cauchy-Schwarz inequality the right hand side is convergent in T~L. Hence the 
orthogonal series estimator of dP/dX is extracted by applying uniqueness of inverse Fourier 
transformation [GIVE REFS] to the coefficients {P n (um)c m } i.e. 

= ^2Pn(u m )c m U m (20) 



Remarks : The choice of the Hilbert space is unknown. There are several different Hilbert 
spaces and basis which can be used, such as the Fourier basis, and wavelets based on 
unknown smoothness parameters and the choice of the thresholding sequence {c m } [9]. Due 
to the arbitrary-ness in the choice of Hilbert space, the choice of {c m }, its efficiency and 
error estimate is user driven. 

The orthogonal series estimate are also convolution estimates ^Y-K(x — Xi) where 
K{x) = Y c m u m (x). This holds by the convolution property of the Fourier transformation. 



There is also an issue regarding the sample space itself. If (X, A) does not have a nice 
topological group structure, (for eg. manifolds in M n , etc), then the question of estimation 
of density is not well posed because we do not have a base measure with respect to which 
we can define the density. In such cases we look for a transformation T which is a 1-1 
measurable open mapping that maps (X,A) into a locally compact topological group. Since 
there exists a natural Haar measure A in the topological group, it makes sense to talk about 
density in this situation. Note that T must satisfy \(T(X) C ) = 0. If we denote the density 
in this space by gx = dPo ^x * ; we can estimate all integrals of the form f h(T)dP by 



Such functions generate the cr-field cr(T). If cr(T) = B (Borel <7-field), i.e., if T is a 1-1 
isomorphic mapping, we can find all such integrals in this indirect fashion. 

1.4 Focus on estimation on S^ 1 with respect to the Haar measure 

The major reasons for concentrating on S 1 are the as follows 

— Trigonometric Fouries basis are computationally very efficient, with tools such as the 
Fast Fourier Transformations, etc 

— The Bochner's Theorem which states that, 

Theorem 1. Every positive definite function Q is the Fourier transform of a positive 
finite Borel measure. 

Proof. Let i*b(R) be the family of complex valued functions on M with finite support, 
i.e. f{x) = for all but finitely many x. The positive definite kernel K(x,y) induces 
a sesquilinear form on i*o(R). This in turn results in a Hilbert space (H,{ , )) whose 
typical element is an equivalence class [g\. For a fixed t in M, the "shift operator" 
Ut defined by (Utg)(x) = g(x — t), for a representative of [g] is unitary. In fact the 

map t h> Ut is a strongly continuous representation of the additive group M. By 
Stone's theorem, there exists a (possibly unbounded) self-adjoint operator A such that 
U-t = e~ lAt . This implies there exists a finite positive Borel measure \x on K where 
(U-t[eo\, [eo]) = / e~ lAt d[i(x), where eo is the element in i*b(R) defined by eo(rra) = 1 if 
m = and otherwise. Because (U-t[eo\, [eo]) = K(—t,0) = Q(t), the theorem holds. 

■ 

The Bochner's Theorem gives a 1-1 correspondence between non-negative definite prob- 
ability measure and non- negative definite sequences. 

— The technique of Fourier spline. Choosing the orthogonal sequence as {e mx } satisfies 
other nice properties in terms of homomorphisms, error in approximations, etc. The 
choice of the penalty can also be easily be determined. 

The main criticism of using the Fourier basis is the fact it fails to capture local features 
such as discontinuities and edges. 

A survey of the literature shows several works on the adaptation to unknown smoothness 
by methods such as wavelet shrinkage (Donoho and Johnstone (1995)) [8], aggregation of 
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thresholded wavelet estimators (Chesneau and Lecue (2009)) [7]. Several other methods in- 
clude, the cross-validation methods (Nason (1995) [22J and Jansen (2001) [15]), the methods 
based on hypothesis tests (Abramovich, Benjamini, Donoho and Johnstone (2006) [T|), the 
Lepski methods (Juditsky (1997) [18]) and the Bayesian methods (Abramovich, Sapatinas 
and Silverman (1998) [2]). 

Edge preserving function estimation has also been studied in literature especially by 
MAP estimators, (see Bouman and Sauer (1992) [5]). 

In this paper we develop a technique for estimation of densities on S 1 using Fourier Spline 
and address the issue of adapting non-smooth local features using a hybrid of frequency 
and spatial domain. Few results on the kernel density estimation can be found in Pelletier 
(2005) [23] and Taylor (2008) [29]. 

The classical kernel density estimator was first proposed by Fisher |11] for data lying 
on the circle, in which he adapted linear data methods of Silverman [28] and used a quartic 
kernel function K{9) = 0.9375(1 — 9 2 ) 2 . However, when using data on the circle, we cannot 
use distance in Euclidean space, so all differences 6 — 6i should be replaced by considering 
the angle between two vectors: 

dt(6) = \\9-6i\\ = mm(\0-di\,2w-\e-ei\) (22) 

This may also be written as d{ = cos^ 1 {x , Xi) where x' = (cos 9, sin 9) is a unit vector. 
A more natural choice for the kernel function is therefore one of the commonly used circular 
probability densities, such as the wrapped normal distribution, or the von Mises distribution. 
This leads to an alternative representation for the kernel density estimate [H] 

1 - 

f(G;h) = -Y j K h {l-x'x i ) (23) 

i=l 

In studying properties of kernel density estimates in Euclidean space, it is common to 
take Taylor series approximations to give an asymptotic form for the bias and variance. 
These can then be combined to give an asymptotically optimal choice for the smoothing 
parameter; see, for example, [28J. For data lying on the q - dimensional sphere (q > 2) |12] 
described the asymptotic bias and variance of two classes of kernel estimators. This was 
done by the use of directional derivatives, thus making the results a close analogue of the 
Taylor series methods used for data in Euclidean space. 

However, the classical kernel density estimators does not give a good estimate in a 
general set-up, when data might follow mixture distribution of a continuous and discrete 
distribution with non intersecting support. In order to illustrate, let us consider the fol- 
lowing. Suppose Xi, X2, ■ ■ ■ X n ~ N(0, 1) truncated on [—§,§) with probability 1 — e and 
a discrete distribution with probability e, where the discrete distribution takes the values 

and — ^ with equal probability. Note that this discrete distribution can be viewed as 
outlying data. The mean squared error using the usual kernel density is estimated via sim- 
ulation study. The simulations have been performed using different sample sizes and by 
taking e = 0.01,0.05 and 0.1. The percentage increase in the MISE when compared to the 
case taking e = is shown is Table 1. The whole process is repeated N = 10 5 times. 

Note that the mean square error increases almost 15% when only 1% outlying data 
is present. For 10% outlying data, the mean square error for the classical kernel estimate 



Table 1. Mean Integrated Square Error of Classical Kernel Estimate 



Sample Size 


Value 


Percentage 


n 


Of € 


increase in MISE 




0.01 


14.86% 


100 


0.05 


47.68% 




0.10 


65.68% 




0.01 


14.90% 


500 


0.05 


57.95% 




0.10 


79.13% 




0.01 


14.95% 


1000 


0.05 


60.26% 




0.10 


80.77% 



increases about 81%. The Figure 1 shows the estimated density for the different e values. 
Note that for e = 0.10, the high discrete nature of the data is being captured by a large 
smooth region instead of peaks at the two specified points. This raises the MSE of the Kernel 
density estimate as seen in Table 1. This motivates us to work on such kind of problems 
when he density is actually a mixture of a smooth and discrete family. We have adapted a 
method based on Splines using Fourier techniques, to work on such problems. Several results 
have been proved in this regard. A comparative study has also been performed with other 
known techniques. Optimality conditions and optimal choice of smoothing parameters are 
also calculated theoretically. The detailed methodology is explained in Section 2. 
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Fig. 1. Kernel Density Estimates for Different e Values 

It is also important to note that the discrete distribution with a disjoint support could 
actually be interpreted as some arbitrary outliers in the situation when we are dealing 
with circular data. Thus when dealing with such outliers two major aspects come to the 
limelight, viz. detection and accommodation. In this article we have developed a novel 
method of dealing with such outliers and accommodating them inherently. Details on the 
procedure are explained later in Section 3.2. 



It is also particularly difficult to detect local features while using such kernel density 
estimators. Local features such as boundary points of support, points of discontinuity or 
the presence of sharp edges in the density or both, are not at all easy to identify, since 
during smoothing, such local features are lost. We have developed a technique to identify 
such local features and adapt them accordingly in the final estimate. Details are provided 
later in Section 3.3. The figure below shows a density drawn from a mixture of a uniform 
density and a triangular density, showing presence of both discontinuity and edges. 



T 




0.7B5398 1 .570796 2.356194 3.141593 3.926991 4.7123S9 5.497787 6.28 



Fig. 2. Example of a Density with local features such as discontinuity at J and ^ and edges at ^f- and ^f- 



With the knowledge of such points of local feature, we shall be able to break the circle 
into small compact disjoint sets which over which the density is smooth. So we adapt our 
methodology of Fourier Splines to estimate the density in these compact sets. Finally at 
the points of detected localizations we estimate the local features using a two parameter 
exponential model. Finally, we join these two estimates to get the final estimate, based on 
a method in general topology called the 'Partition of Unity' [21 J 

1.5 Using Partitions of Unity 

A partition of unity of a topological space X is a set of continuous functions, {pi}iei, from 
X to the unit interval [0,1] such that for every point, x £ X, 

— there is a neighbourhood of x where all but a finite number of the functions are 0, and 

— the sum of all the function values at x is 1, i.e., Yliei Pi( x ) = 1- 

Sometimes, the requirement is not as strict: the sum of all the function values at a 
particular point is only required to be positive rather than a fixed number for all points 
in the space. Partitions of unity are useful because they often allow one to extend local 
constructions to the whole space. 

The existence of partitions of unity assumes two distinct forms: 



— Given any open cover {C/jjie/ of a space, there exists a partition {pi}i^j indexed over 
the same set / such that supp pi C U{. Such a partition is said to be subordinate to the 
open cover {Ui}i. 

— Given any open cover {Ui}^! of a space, there exists a partition {pj}j^j indexed over a 
possibly distinct index set J such that each pj has compact support and for each j £ J, 
supp pj C Ui for some i £ I. 

Thus one chooses either to have the supports indexed by the open cover, or the supports 
compact. If the space is compact, then there exist partitions satisfying both requirements. 

The construction uses modifiers (bump functions), which exist in the continuous and 
smooth manifold categories, but not the analytic category. Thus analytic partitions of unity 
do not exist. 



Fig. 3. A partition of unity of a circle with four functions. The circle is unrolled to a line segment (the 
bottom solid line) for graphing purposes. The dashed line on top is the sum of the functions in the partition. 

For our use, partition of unity helps us in localizing the problem. Based on this, any 
smooth function / can be decomposed as 



where (1 — YliPi) f denotes the smooth part and Y^iPif covers the localization. We treat 
these two separately and finally add to get the final estimate such that the final estimate 
reflects the localization of the problem adequately. The novelty of our approach lies on the 
fact that along with the smooth estimate of the density, various local and other features are 
also shown. Instead of just a vector based output, a complete structure is obtained, which 
enables the data to be stored in a structural form. Thus being of utmost importance in 
situations pertaining to database handling. 

Remarks : Estimation of pif, p%f,... are usually based on wavelets. We have used a local 
adaptation of the wavelet procedure using an appropriate basis. Several methods of such 
adaptation are available in literature, especially the work by Donoho and Johnstone [9]. 
We have used local exponential model to estimate these functions, while the Fourier Spline 
technique is used to estimate the smooth function. The window for p\ and p2, ■■■ (more if 
we detect more points) are determined experimentally. Note that these are anyway centred 
at the detected locations of discontinuities. We apply this technique on several simulated 
data, and the results are shown in Section 5. 
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The rest of the article is organized as follows; In Section 2, we focus on the methodology 
for density estimation using the Fourier spline approach, giving details on MSE and penalty 
selection. In Section 3 we adapt our method to local features (explaining the various local 
features) and the strategy to handle (i) support and outlier issue (ii) local exponential 
modelling of jump discontinuities and edges. Section 4 deals with the concept of unifying 
the local estimates and the smooth function estimates through an idea of partition of unity. 
Simulated study, real life data and discussions follow in Sections 5, 6 and 7. Finally we 
conclude in Section 8. 

2 Smooth Density Estimation Using Fourier Spline 

In our situation we are working with observations on a circle which are assumed to follow a 
probability distribution which is absolutely continuous with respect to the Lebesgue measure 
or the Haar measure. And our aim is to find its density estimate. There are several methods 
discussed in literature [21] regarding the estimation of density on the circle such as the 
Kernel Density Estimates with detailed study on their Bandwidth selection |29j . 

In this article, we propose a method to use the Fourier Splines technique to get an 
estimate of the density at a point x. Comparative study has been performed between 
our procedure and the usual procedures based on Kernel Density Estimation. We have 
used Cross-validation [6J and Plug-in [IS] methods in our comparative study. We have not 
only used the usual Epanechnikov kernel but also a custom kernel as K(9) oc cos 2 (#) for 
6 £ [— ^, \ ). The Mean Square Error studies are shown in Table 2. Finally we have incor- 
porated a penalty selection method in the procedure and found its optimal estimate using 
the bias-variance trade-off calculations. Repeated simulation study has been performed to 
computationally prove the accuracy of our selection estimates. 

2.1 Fourier Series on Circles 

Before we begin the detailed theory, let us give a brief introduction to the General Fourier 
Transformation theory. 

2.1.1. Classical Fourier Transformation [10J : There are several common conventions 
for defining the Fourier transform / of an integrable function / : R — > C .In this article will 
use the definition: 



When the independent variable x represents time, the transform variable £ represents 
frequency. Under suitable conditions, / can be reconstructed from / by the inverse trans- 
form: 




for every real number £. 



(24) 




for every real number x. 



(25) 



The statement that / can be reconstructed from / is known as the Fourier integral 
theorem, and was first introduced in Fourier's Analytical Theory of Heat (Fourier 1822, 



p. 525), (Fourier & Freeman 1878, p. 408), although what would be considered a proof by 
modern standards was not given until much later (Titchmarsh 1948, p. 1). The functions / 
and / often are referred to as a Fourier integral pair or Fourier transform pair. 



Few important properties: 

— Plancherel theorem and Parseval's theorem : Let f(x) and g{x) be integrable, and 
let /(£) and g(£) be their Fourier transforms. If f(x) and g{x) are also square-integrable, 
then we have Parseval's theorem (Rudin 1987, p. 187) |27| : 



f(x)g(x)dx = / /(0s(0d£, (26) 

J — oo 

where the bar denotes complex conjugation. 

The Plancherel theorem, which is equivalent to Parseval's theorem, states (Rudin 1987, 
p. 186) [27]: 

oo /*oo 2 

|/(x)| 2 dx= / f(0 d£. (27) 

-oo J —oo 

The Plancherel theorem makes it possible to define the Fourier transform for functions 
in 1? (K) . The Plancherel theorem has the interpretation in the sciences that the Fourier 
transform preserves the energy of the original quantity. 



— Convolution theorem : The Fourier transform translates between convolution and 
multiplication of functions. If f(x) and g(x) are integrable functions with Fourier trans- 
forms /(£) and g(^) respectively, then the Fourier transform of the convolution is given 
by the product of the Fourier transforms /(£) and g(£) (under other conventions for the 
definition of the Fourier transform a constant factor may appear). 
This means that if: 

/oo 
f(y)g(x-y)dy, (28) 
-oo 

where * denotes the convolution operation, then: 

ko = f(o-m- (29) 

Conversely, if f(x) can be decomposed as the product of two square integrable functions 
p(x) and q(x), then the Fourier transform of f(x) is given by the convolution of the 
respective Fourier transforms p(£) and q(£,)- 

2.1.2. General Theory on Circle : Let Z\, Zi, . . . , Z n be the observed data on S . Let 
F denote their distribution on S 1 . Note that each Zj = e* 9j and F n = 4 Ym=i ■ Also 

r-2-K -i n -i n i n 

F n {k) = / e ikx dF n {x) = -Y e ik9] = ~ Y Z i = % = " y"{cos(it^) + i sin(^,,)} 

JU i=l i=l j=l 

(30) 

Clearly, F n (0) = 1 and F n {—k) = F n {k). Before we move ahead, we first prove that there is 
a 1-1 correspondence between the data and the empirical moments. 



Proposition 1. Empirical Moments have a 1-1 correspondence with the Data. 

Proof. Suppose we know all the empirical moments, i.e., we know, ^ Zj, ^ Z 2 , . . . , ^ Z™. 
We need to find all the data. Note the data, Zi,...,Z n forms the root of the polynomial 
niLi ( z ~ Zi)- We can find all the coefficients of the power of z from the empirical moments. 
For eg. the coefficient of z 2 is Yli^j ZiZj, which can be calculated from the moments, as 
Yli^j ZiZj = (Yl Zi) 2 — ^2Zf. Thus, if we know all the empirical moments we can compute 
the coefficient of all powers of z of the polynomial. Hence, the roots of the polynomial will 
be the data. Conversely, if we know the data, we can obviously find the empirical coefficient. 
Thus, there is a 1-1 correspondence between the two. Hence proved. 



Now, given a positive finite Borel measure /x on the real line M, the Fourier transform 
Q of fi is the continuous function 

Q(t) = [ e- Ux dfi(x). (31) 

Q is continuous since for a fixed x, the function e~ ltx is continuous and periodic. The 
function Q is a positive definite function, i.e. the kernel K(x,y) = Q(y — x) is positive 
definite; Now Bochner's Theorem says that the converse if true. 

Thus in case of data on the circle, by the Bochner's Theorem, the moment transformation 
E(Z k ), transforms the problem from the measure space to the sequence space of bi-infinite 
sequences which are non- negative definite. Note that the solution to the Smoothing Spline 
problem in circle, belongs to the class of non-negative definite sequences. Thus Bochner's 
Theorem gives a 1-1 correspondence between non- negative definite probability measure and 
non-negative definite sequences. Now all square summable non- negative definite sequences 
are equivalent to all measures which are absolutely continuous with respect to the Haar 
measure satisfying f \ f"\ 2 d\ < oo. This result gives rise to Fourier Splines. Let us explain 
how. 

First note that the smoothing spline problem, is actually a curve fitting problem, where 
the minimization takes place on the space of all functions with square integrable second 
derivative, correspondingly on the space of non-negative definite sequences when the data is 
coming from a circle. However, our problem is actually density estimation, where we have a 
data function Y(x) and we need a fitted function fi{x) such that the I? norm is minimized, 
that is, we need to minimize 

±- J 2 J \Y(x)-Kx)\ 2 dx (32) 

We know that the empirical Fourier coefficients is a sequence in C. Thus, given data on 
the circle, we can transform P n — > P n = (uq,u±,U2, ■ ■ ■) € C. Similarly, the density can be 
transformed, /—>■/ = (no, f^i, ■ ■ ■)■ Now, if we are in the situation, where we know that 
the density belongs to L 2 (0, 2ir), then using the empirical Fourier coefficients we can derive 
the density using the inverse Fourier transformations, i.e. f(x) — > f(x) = ^Uje 1 ^. Note 
that we can write L 2 (0, 2tt) as the class of functions, T n defined by 

f n/2 \ 

Fn = I f ■ f(x) = ^2 Uje~ ljx where (uo,ui, . . .) G C and ^ |u«| 2 < oo > , (33) 

I j=-n/2 I 



where we have consider the coefficients to be for \j\ > ^. Thus our aim is to get a function 
belonging to this class. 

Now because of the form of the Fourier coefficients on the circle, the minimization 
problem given in (23) reduces to X^fc^o ~~ u k\ 2 - However, as k runs from — oo to oo, this 
sum diverges to oo. Thus we work with only the n moments, to get a solution belonging 
to L 2 (0, 27r). That is we try to minimize, s ^!k-- n /2 ~ u k\ 2 - Note that the empirical 
Fourier coefficients is not absolutely continuous with respect to the Haar measure. Hence by 
Bochner's Theorem, the solution to this problem derived as the empirical Fourier coefficients 
is not square summable. Thus, we must add the penalty term to the problem, viz, A||u|| 2 
to get a square summable solution. Now, square summability of the solution implies the 
existence of a density / by the Fourier Inversion Theorem [10] . 

Thus the Fourier coefficient Uk can now be written as 



1 



2- 



u k = x k + iy k = ^- I e ikx f{x)dx (34) 
^ Jo 

Note that U- k = Uk- Furthermore, by inverse Fourier transformation, and considering that 
the density is real, we can say 

oo oo —1 

/(*)= Y u k e- ikx = l + Y,u k e- lkx + Y, 

k=—oo k=l k=—oo 

oo 



k=l 

oo 

l + 2VRe{(x fc + m )e- lb } 



Thus we have, 



fix) = 1 + 2 Yji x k cos(kx) + yk sin(A;x)} (35) 



k=l 



Hence now, the penalty term become A J f (x) 2 dx, and the problem now becomes, to 
find the minimizer of 



2n 

\Y(x) - f(x)\ 2 dx + A / /" (x) 2 dx (36) 



1 

in the class L 2 (0,27r). Equivalently, the problem in the smoothing spline format becomes, 
the minimization problem of 

oo oo 

Y \u k -u k \ 2 + x{Yk\4 + y 2 k)} (37) 

k=— oo k=l 

Thus, we get a 1-1 correspondence between the curve fitting problem and the density 
estimation problem using the Fourier basis. Thus the Fourier Spline technique is a method 
of solving the Smoothing Spline problem, where the solution is obtained using the Fourier 
basis. Now we proceed to actually solving this minimization problem. 



2.2 Solution to the Minimization Problem 

Theorem 2. Solution to the minimization problem X^fcL-oo — n fc| 2 +MSfcLi ^ 
gives rise to a kernel like density estimate f n (x) = 1 + X)jjfcl=i ^kCk{X)e~ lkx , where Cfc(A) = 
1+ ^ fc4 , obtained by the convolution of the empirical Fourier coefficients, with the kernel 
K(x) = 1 + ££|=i C fe (A)e-^ = 1 + 2 5Xi f^p /or x E [-tt, vr) 

Proof. From equation (26) we get, 

n/2 

/ n (x) = 1 + 2 cos(£;x) + y fe sin(/ca;)} (38) 

fc=i 

Differentiating this twice, we get, 

n/2 

f n (x) = —2 ^^{xfcA; 2 cos(kx) + yf.k 2 sin(/cx)} (39) 
k=l 

We need to minimize ^^=-11/2 ~ Ufc l 2 MSfc=i ^ 4 ( x k + Vk)}- That is 

^|(co^p)-x fe ) +(shTp)-y fc ) |+A|^fe 4 ( 
fc=i ^ ^ ^ fc=i 

Differentiating this with respect to Xk and yk for each k, and equating to zero, we get, 



xz, = — - and % = — '- (40) 

fe 1 + Afc 4 y 1 + AA; 4 17 

Now let Cfc(A) = 1+ ^ fc4 , Thus the estimated Fourier coefficient is UfcCfc(A). Now f(x) = 
1 + 5Z|fc|=i Uke~ tkx . Thus f n (x) = 1 + S|fc|=i UkCk{\)e~ lkx , which is the seen as the convo- 
lution with the kernel K{x) = 1 + £j£ |=1 C fc (A) e - ifcx = 1 + 2 JXi fw for x e t _7r > 7r )" 

Hence, proved. 



Remark : Note that our kernel may not be non- negative, for very small values of A. The 
Figure 4, shows plots of the kernel K(x), defined as, 

00 , 

K(x) = 1 + 2J2 fOT * e *) ( 41 ) 

k=i 

for different values of A. It can be seen from the figure that for A < 0.8, the Kernel does 
take negative values. However, such situations are not out of the ordinary. A study of the 
literature reveals that such kernels present in the class of higher order kernels have been 
used for bias reduction in kernel density estimation |16|I17| . They have also been used in 
non-parametric curve estimation as they often give faster asymptotic rates of convergence. 

With this estimate of density at hand, we proceed to the calculation of the mean square 
error as well as the optimal rate and choice of the penalty parameter A. 



1 = 0.01 



1= 0.05 
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Fig. 4. The Kernel with different values of A. Note that for values of A more than 0.8 we get a non-negative 
kernel. But for smaller A, the kernel does take non-positive values, as seen from the above Figures. 



Theorem 3. For the density estimate obtained above, we have, 



E|/nOc) - f(x)\ 2 = Y,Y, u ^k'(C k (\) - 1)(CV(A) - l) e -^- fc > 



k k' 



C k {X)C k i{\) u a „-i{k-k')x 



+ 2^ /_> (Uk-k' - u k u k >) e 



n 

k k' 



Proof. From the expression of /„ obtained from Theorem 2, we get, 



\f n ( x )-f(x)\ 2 = { £ {u k C k {\)-u k )e- lkx } x { Y, (u k C k (\)-u k )e~^} 
\k\=i \k\=\ 

= E E ^kC k (X) - u k ) (u k ,C k ,(\)-u k ,)e-^ k - k > 



k k> 



E\f n (x) - f{x)\ 2 = ££E(u fc C fc (A) - u k ) (u k ,C k ,(\) - u k ,)e^ k - k '> 
k k' 

= E E E (u k C k (\) - u k ) {frC v (A) - u*7) e "*(*-*')* 



fc k< 



fc fe' 



Uk 



3=1 



j'=i 



D ~i(k—k')x 



Now the term inside the expectation can be written as 



E 



1 n 1 f 1 n 1 

- E ( Z j ~ Uk ) + Ufe \ C kW ~ u k \ E ( Z f ~ + Ufc ' f^'W - 

71 7 = 1 ' ^ 71 7=1 ' 



Replacing ^ Sj=i ~~ by the expression reduces to 



E{(R k + u fc ) Cfc(A) - u k }{{R k > + u k >) Cy{\) - u k ,} 
= E(R k C k (X)+u k (C k (X) - 1)) {R«C k ,{X) + u k ,{C w {X) - 1)) 
= C k {X)C w {X)E(R k R k ,) + u k (C k (X) - l)u k ,{C k ,(X) - 1) 

= -C fc (A)C fc /(A) (?j fe _ fc / - UfcMfe/) + n fc (C fc (A) - l)u k ,(C k ,(X) - 1) 
n 



Thus we get, 



E|/ n (x) - f(x)\ 2 = J2J2 u ^'(C k (X) - 1)(CHA) - lje"**-*')* 



fe fc' 



\ \ ^ Cfc(A)C fc /(A) -i( k -k')x 



k k' 



n 



Hence Proved. 



Theorem 4. The optimal rate of the MISE is n 4 / 5 , which matches the rate for the original 
kernel smoothing problem. 



Proof. Using the expression for the mean square error, we calculate the integrated mean 
square error as, 

f 2 \\f n (x) - f(x)\ 2 dx = y j Tu k u k ,(C k (\) - 1)(CHA) - 1) r e -^ k - k >dx 

J ° k k' J ° 

+ E E Ck{X) ^' iX) {u k -k> - u k u k .) r e^ k -^d X 
k k' n Jo 

= 2ttE u k u k {C k {\) - l) 2 + 2vrE ^2(1 - u k u k ) 
k k n 

= 2vr E \u k \\C k {X) - I) 2 + 2vr E C k {Xf {l -^ k?) 
k k 

First note that if A — > 0,C k (X) — > 1. which implies the first term goes to zero while the 
second term diverges to infinity. Thus we have with us a situation which is similar to the 
bias- variance trade off situations. In order to find the optimal rate we to equate the rates 
of these two terms. Now note that the 2nd term of the above expression can be written as 



^— ' n n 

k 

where 

which can be thought of an expectation of a random variable which takes values 27r(l — \u k \) 2 

C (X)' 2 CV(A) 2 

with probability, ^ c k l\)i - Since this is bounded, the rate depends on n > which is 
clearly 1/ (nA 1 / 4 ). Similarly when we work with the first term, we see that 

E(c fe (A) - i) 2 ki 2 = e (1 + Afc4)2 fc V' 2 = E fe4 i^i 2 x E (1 + k Xk A ? Pk 

where 

^ <43) 

Note that the term ^2 k k A \u k \ 2 = 5 (say) is an approximation to J f (x)dx. Thus the 1st 
term can be written as 

\ 1.4 

2ttE(^(A) - 1) 2 |^| 2 = A x B E (1 + Afc4)2 P fc (44) 

which has a decay rate of A. Thus the optimal rate is obtained by equating these two rates. 

' = A n => X n = n" 4 / 5 (45) 



Hence Proved. 



Furthermore, the optimal MISE can be expressed as, 

MISE opt (f) = 2vr^ \u k \ 2 (C k {\) - if + 2vr ^ C k {\) 2 (1 ~ ^ (46) 

k k 

where A is obtained numerically over by a grid search which minimizes the above equation. 
With this methodology, we are able to estimate the density function at a point x using 
the optimal penalty as A, obtained by the above mentioned procedure. Before we show the 
simulated results on the various different samples, we briefly describe the methodology for 
using the cos 2 kernel. 



2.3 Using cos 2 Kernel 

Let Zi, Z2, . ■ . , Z n be the observed data on S 1 . Let K denote a smoothing distribution on 
S 1 . Let fi, & , • • ■ , be i.i.d. K. Define Z[ = Z' 2 = Z 2 &, ■ ■ ■ , Z' n = Z n £ n . Note that 

P{Zi eA) = -J2P(^Z i A) = -J2 f K(uW{u) 

Thus the density estimate at some point a is given by ^ X^i K(Zia). Note that we can 
transform each Zi to 0j G [0, 2tt) and each point a to e tx . Thus for each x G [0, 2n) the 
density estimate is ^ Ylj K(e i( - X ~ 9 ^). Let K(x) = K(e tx ). One such choice of K(x) can be 
defined as follows 



K(x) 



2mcos {mx) if \mx\ < f 

7T 11 — 1 



otherwise 

m/2m 2/ 



Since note that f_ n /™ m cos 2 (mi) = ^ • Thus the density estimate at point x boils down to 

1 2m n 

f n (x) = - x — J^cos 2 {m(x - 6'j)}J| m(a _ e . ) |< 7r/2 
i=l 



Another estimate using inverse Fourier transformation, i.e. for Z^ m we get, 

J 

where the expectation is calculated through the cosine density. 



k=—oo \ j=l / 



2.4 Mean Square Error Calculations 

We have performed detailed comparative simulation study to see which method produces 
the best results. The methodologies followed for the comparative study are as follows : 

— Using Fourier Spline Methodology 

— Using the cos 2 kernel using plug-in technique for bandwidth 

— Using usual KDE with cross-validation technique for bandwidth selection 



— Using usual KDE with plug-in technique for bandwidth selection 



We have considered a wrapped mixture normal model on the circle with two modes 
at 6 and —0, where 9 € [0, 7r/2). We take n = 50 observations and calculate the density 
using the four methodologies. The Epanechnikov kernel has been used in the usual KDE 
methodologies. Since in the method based on Fourier Spline, the smoothing parameter A 
for the kernel, does not work as the usual linear scaling parameter, we consider the cos 2 
kernel as an alternative comparison by using the usual plug-in estimate. We repeat this 
procedure for N = 10 6 times and calculate the mean squared errors. All simulations have 
been performed on MATLAB 7.8.0. The results of the simulation study are shown in Table 
2. 

It can be seen that when we are considering a bi-modal continuous distribution, the 
estimate obtained through the Fourier Spline has a smaller MSE when compared to all 
other methods. 



Table 2. Mean Square Errors at x = 



6 


Fourier 


cos 2 


KDE Cross- 


KDE 


Values 


Spline 


Kernel 


Validation 


Plug- in 


0.10 


0.0041 


0.0056 


0.0086 


0.0101 


0.15 


0.0033 


0.0049 


0.0069 


0.0077 


0.20 


0.0041 


0.0053 


0.0116 


0.0103 


0.25 


0.0028 


0.0038 


0.0086 


0.0105 


0.30 


0.0029 


0.0037 


0.0087 


0.0107 


0.35 


0.0040 


0.0053 


0.0099 


0.0085 


0.40 


0.0033 


0.0042 


0.0119 


0.0095 


0.45 


0.0022 


0.0028 


0.0061 


0.0083 


0.50 


0.0023 


0.0029 


0.0070 


0.0042 


0.55 


0.0028 


0.0034 


0.0080 


0.0094 


0.60 


0.0033 


0.0037 


0.0095 


0.0100 


0.65 


0.0025 


0.0029 


0.0063 


0.0055 


0.70 


0.0018 


0.0023 


0.0044 


0.0067 


0.75 


0.0021 


0.0021 


0.0057 


0.0060 


0.80 


0.0022 


0.0025 


0.0047 


0.0071 


0.85 


0.0019 


0.0020 


0.0046 


0.0074 


0.90 


0.0002 


0.0019 


0.0047 


0.0059 


0.95 


0.0027 


0.0027 


0.0057 


0.0087 


1.00 


0.0024 


0.0025 


0.0058 


0.0101 


1.05 


0.0018 


0.0019 


0.0028 


0.0047 


1.10 


0.0016 


0.0017 


0.0040 


0.0041 


1.15 


0.0018 


0.0018 


0.0041 


0.0101 


1.20 


0.0014 


0.0013 


0.0023 


0.0043 


1.25 


0.0017 


0.0018 


0.0026 


0.0025 


1.30 


0.0014 


0.0015 


0.0022 


0.0029 


1.35 


0.0015 


0.0016 


0.0023 


0.0023 


1.40 


0.0015 


0.0016 


0.0019 


0.0026 


1.45 


0.0013 


0.0012 


0.0015 


0.0028 


1.50 


0.0016 


0.0017 


0.0018 


0.0039 


1.55 


0.0015 


0.0015 


0.0017 


0.0100 



3 Detection of Local Features 



Let T = [0, 2ir) = |J A n j where A nj 



27TJ 27T(j + l) 

2 n ' 2 n 



for j = 0, 1, . . . , 2 n — 1 is a partition 

of the circle. Note that 2 n A nj - 2vrj = [0,2vr). Now for y G [0, 1), / ( 27r( 2 J ^ ) ) is the local 
information of / on A n j. P n is a measure on [0, 2tt). Let the sample from A n j be denoted 
by S n j and 5 n = Uj^ 1 &nj- Thus 2 n S n j — 2irj maps 5 n j to [0, 2tt). As n becomes larger 
the information becomes more and more local. 

As mentioned before, we can see that because of Theorem 2, the localisation property is 
lost as the density is calculated by a weighted average of these local information. Hence the 
usual Fourier Spline estimate does not give a good estimate of the local features. Thus, in 
this section we develop the methodology of detecting the local features. Estimation of the 
local features is explained in details in Section 4. 

We use the information present in each A n j to detect the presence of discontinuity or 
edges. We shall now delve into the details of those procedure starting with the generalised 
testing procedure. The distributional details for testing specific local features are explained 
subsequently. 



3.1 The Testing Procedure 

We first fix an n large enough. Suppose we want to test 

Hq : There is some local feature in A n j vs. Hi : Hq is false 

The exact testing procedure will depend on the local feature that we are trying to 
determine and is explained later in sub-sections 3.2 - 3.3. Suppose T be the test statistic, 
and based on this we can find the p- value of the test, let us denote that by PA nj - Note 
that the same test can be performed on the previous layer, that is, on the partition formed 
by ^4( n _!)j and subsequently on earlier layers. Thus we have a nested layers culminating 
in the last layer. It might be possible to get contradictions, for e.g. the test is rejected at 
partition level n, it might be possible that the test is accepted at partition level n — 1. So, 
we can consider having a multivariate p-value profile, where the i-th co-ordinate is the p- 
value corresponding to testing on A^ . This poses a problem of inferring from a multivariate 
p- value. 

We suggest a procedure for the same. Note that if we had a single p- value for all the units 
in the partition we could perform the Holm's procedure of multiple testing [flj to give us the 
regions which contain local features at a some level a. However instead of a single p-value, 
for each unit in the partition we have a multivariate p- value profile, say (pi, ...,pk)' ■ Now 
under null, the p-value usually follows a uniform distribution, but under the alternative, 
it can be modelled well using the Beta(a, /3) distribution with a high value for a. Thus 
we can assume, pi,.-.,Pk are i.i.d Beta(a, f3). Thus an obvious sufficient statistic is the 
geometric mean, i.e. (Il^iPi) 1 ^™- However instead of using the usual geometric mean in 
the Holms procedure, we use a weighted geometric mean, where the weights are chosen 
to be in a decreasing sequence with the increase in the partition level. We came to this 
conclusion after extensive simulation study. That is, for each j, we have with us a p- value 



profile (pi, . . . ,Pk), and we consider the sufficient statistic as 

P 3 =X\pT (47) 
i=i 

where 53i=i ^ = 1 an< ^ { w *}i<i<jfc ^ s a decreasing sequence, for j = 0, . . . , 2 n — 1. Using, 
these Po, ■ • • , we apply the Holm's procedure which can be explained as follows. 

3.1.1 Holm's Procedure The Holm procedure [26J can conveniently be stated in terms 
of the p- values Po, . . . , i^-l of the 2 n = s(say) individual tests. Let the ordered p- values 
be denoted by Pn\ < ... < P( s ) , and the associated hypotheses by Hm , . . . , Hr s \ . Then the 
Holm procedure is defined stepwise as follows: 

Step 1. If Pm > a/s, accept Hn ),..., Hr s -\ and stop. If Pm < a/s reject Hn\ and test 
the remaining s — 1 hypotheses at level a/(s— 1). 

5tep 2. If P(!) < a/s but P( 2 ) > a/(s — 1), accept Hr 2 ), • • • i an d stop. If Pm < a/s and 
P( 2 ) < a/(s — 1), reject i?( 2 ) m addition to Hm and test the remaining s — 2 hypotheses at 
level a/(s — 2). And so on. 

Thus by this procedure we shall be able to detect the exact regions which contain the 
local features. Now we describe the exact tests for detecting the individual local features 
starting with support and outlier detection. 

3.2 Support and Outlier Detection 

This is the first step towards the detection of local features. Note that we have with us the 
data which is most likely following a mixture distribution from a discrete and a continuous 
distribution. So it is likely to get a few discrete observations on the edges of the support. 
Our aim is to find all such positions so that we can eliminate such outliers and then later 
incorporate them again to give the final estimate. 

Now when we consider each A n j, observe that if the support begins at some point 
xq G A n j (without loss of generality we consider the support to be the right of xq) then we 
expect a very small number of points to the left of xq, while an abundance of points to the 
right of xq. Thus we could model the left of x$ by B(n,c/n) which converges to a Poisson 
number of points, while the right side follows a multinomial distribution, with probability 
directly proportional of the length of the arc. Note that here the optimal test would be 
testing Hq : A = vs H\ : A > (where A is the parameter of the Poisson Model). However, 
testing this situation is very risky, because A = denotes a degenerate distribution at 0. 
Hence, if somehow we get a single value to the left of xo, our test is rejected. In order to go 
around this situation, we consider the B(n, c/n) and test for Hq : c < 1 vs. H\ : c > 1. 

Hence our algorithm for testing Hq : End of upport lies in A n j vs. H% : Hq is false, is as 
follows : 



Step 1 : Denote xq as the centre of A, 



Step 2 : Count the no. of obsevations to the left of xq, call it T. 
Step 3 : Accept the Null hypothesis if T < 1, otherwise reject it. 

Step 4 : The p- value of the test is 1 — F(T), where F is the cumulative probability distri- 
bution for B(n, 1/n) 

Note that this exact same test works for testing Hq : There exists outliers in A n j vs. Hi : 
Hq is false. Note that we do this test for every partition layer and using the procedure 
mentioned in section 3.1 we finally get in the which units of the partition, the test is 
rejected at level a. Thus, with this information, we are able to identify the exact region of 
the support and the points of outliers. So, we keep track of such regions and remove them 
from the data. Thus now we work in the support of the distribution and proceed to the 
detection of discontinuities and edges. 

3.3 Discontinuity and Edge Detection 

Note that the regions of discontinuity or edges can be suitably modelled using the exponen- 
tial family, 

f(x\/3) <x exp {(3 x + p x I{x > x ) + /3 2 (x - x ) + } (48) 

where xq is the mid point of the region A n j in question, / is the indicator function and (•)+ 
is a positive function which take the argument value if it is positive and otherwise. Note 
that here /3i is the parameter which controls the jump discontinuity, while /?2 controls the 
edge effects. We first test the existence of discontinuity in a particular unit of the partition, 
say A n j. If, there is no discontinuity we proceed to testing if the A n j has an edge or not. 
Thus the algorithm for testing the presence of discontinuity or edge is as follows: 

Step 1 : We first test Hq : There is no discontinuity in A n j vs. Hi : Hq is false. This is 

equivalent to testing Hq : f3i = vs Hi : (3i ^ 0. 
Step 2 : We proceed by the usual Likelihood ratio test for exponential family. Our test 

statistic is 

. su P/3gHo rL ig s nj /(^Ifi) fAQ . 

Note that the supremum is easily obtained by using the MLE of the parameters which 
are easy to obtain experimentally since both the numerator and the denominator belong 
to the exponential family. [25] 

Step 3 : Now T = —2 log A follows a Chi-Squares distribution with degrees of freedom 
3 — 2 = 1. Thus, we reject the test if T > Xi. a /2, where A" 1:Q ,/ 2 denotes the (100 — a/2)-th 
percentile of the chi-squares distribution with degrees of freedom 1. 

Step 4 : If the test is rejected, that is, there is discontinuity in this region, we stop, and 
report the p- value as 1 — F(T), where F is the cumulative probability distribution of 
the chi-squares with degrees of freedom 1 . If no discontinuity is detected we proceed to 
Step 5. 

Step 5 : Now we test Hq : There is no edge in A n j vs. Hi : Hq is false. This is equivalent 

to testing Hq : (3 2 = vs Hi : (3 2 + 0. 
Step 6 : We proceed exactly in same way as in Steps 2 and 3. 

Step 7 : Finally we report the p- value as 1 — F(T), where F is the cumulative probability 
distribution of the chi-squares with degrees of freedom 1 . 



As before, we do this for every partition layer and using the procedure in Section 3.1, we 
finally detection the regions which contain discontinuity or edges. Thus we are able to break 
our support into compact intervals, which can be clearly categorised into sets containing 
local features such as discontinuity or edges and sets where the density is smooth. Using this 
knowledge we can proceed to the overall density estimate as will be explained in Section 4. 



4 Overall Density Estimates 

In this section we describe how we obtain the overall density estimate using the idea of 
partition of unity. Note that using the methodology described in Section 3, we will be able 
to detect regions of local features, in the support of the density as well as regions of outliers. 
From these regions we can create an open cover of the support of the density. Thus based 
on the theory of Partition of Unity, we can find a partition using the function, 

p(x)= /ex P (-tan^) -a < X < a 
I otherwise 

Note that this function is smooth and takes positive values in (—a, a) and outside this 
support. Thus with appropriate choice of a we can always construct a partition of unity, 
say {pi}i£[ using this function. We shall use this partition of unity to estimate the density 
in the concerned regions. 

Note that as explained before, suppose we find n regions which possess a local feature, 
and another region where the density is smooth. Let {U} denote these open sets, and 
{Pi}iei denote the partition, such that supp pi C C/j. Using this notation, our whole density 
can be partitioned as 

n / n \ 

f(x) oc Pi{x)9{x) + 1 - E K x ) (51) 

i=l V i=l J 

where each pi(x)g(x) estimates the local feature in the set U, while (1 — J27=i Pi( x )) h( x ) 
corresponds to the smooth region which is estimated using the Fourier Spline technique 
as explained in Section 2. Note that pi takes the value ourside U, thus we can find the 
normalising constant by integration. Thus our final density estimate is 

= E£=i Pi(x)g(x) + (1 - J27=i Pi(. x )) K x ) ( 52 ) 
It £?=i Pi( x )9(x) + (1 - £?=i Pi(x)) h(x) dx 

Thus we start by estimating the local feature. 



4.1 Estimation of Local Feature 

We have with us a region which is the support of pi and we wish to estimate the local 
feature pi(x)g(x). Consider all the points lying in the support of pi, i.e. consider all x £ U. 
We fit the local exponential model given by, 

, . . = exp{/3 :r + Piljx > gg) +/3 2 (x-x ) + } 
9{XlP> J Ui exp{f3 x + f3 1 I(x>x )+^(x-x ) + } { ' 



using the data in Ui. Let J v exp{/3 x + ft/(x > xq) + ft(x — xo)+} = A(ft,ft,ft), (say). 
Then the log-likelihood function can be written as 



l{Po, Pi, ft) = ft x + & E ^ > + & E ( x " " n lo S A (Po, ft, ft) (54) 

x£Ui xeUi x£Ui 

where n is the cardinality of Ui. From this, the maximum likelihood estimates of (/Jo, ft, ft) 
can be obtained, numerically. Thus, the final estimate for the local feature at Ui is 



cxp 



^ jftx + ft/(x > x ) + ft(x - x )+\ 

Piix) x ^ J- (55) 

A(/? ,ft,ft) 



where (ft, ft, ft) denotes the maximum likelihood of the parameters. This procedure is 
repeated for all the regions which have been detected to possess local features. Now we 
proceed to adapting the method based on Fourier Splines to estimate the smooth portion 
of the density. 



4.2 Estimation of Smooth Feature 

We are left with estimating the smooth part of the density, i.e. (1 — Y17=i Pi( x )) f( x )- Sup- 
pose we have the data as Z±, Z 2 , . ■ ■ , Z m lying on the circle. Treating the required function 
as a smooth function which we need to estimate, we can see that the empirical Fourier 
coefficient is given by, 



f{x) dx = / e lkx 1 - V fH (x) dF m (x) (56) 





Thus, this simplifies to, 

Thus, instead of the usual average of the Z\ we get a weighted average of Z\. The sample 
moment, in this situation becomes weighted sample moments, where the weights are pro- 
portional to where the data is coming from. If Zj comes from a region which has a local 
feature, say Ui, then its weight in the smooth region decreases from 1/m to (1 — pi(Zj))/m. 
On the other hand, if Zj belongs to the smooth region, then by the definition of the partition 
{pi} we have, Pi(Zj) = V i £ [l,n]. Thus the weight remains as 1/m. 

Now following the same procedure as in Section 2 we get the estimate of the smooth 
function as 

m/2 

/ m (x) = 1 + J2 u k C k (\)e- ikx (58) 
1*1=1 



where 



Tfl / Yl \ 

j=l \ 1=1 / 



The optimal choice of A is obtained by minimizing the MISE given by equation (37). Thus 
the final estimate is 

TIl=lPi{x)gi{x) + fm(x) 
Jo" ££=1 Pi( X )9i(x) + fm(x) 



/» = rt i =\ l K T:r;T J T: , ( 59 ) 



where Pi{x)cji{x) and f m (x) is obtained from (46) and (49). This completes the theory of 
estimation of density on the circle, taking into account the local features. We now proceed 
to the section on simulations to highlight how our methodology works in different situations. 



5 Simulation Study 

In this Section we apply our methodology, on three test data sets, each showing a particular 
aspect of our Methodology. The Usual Kernel Density estimates are also applied on the 
data sets, and a comparative study of their MISE is provided. 



5.1 Density with Discontinuity and Edge Effects 

We use the same density which we used to portray the existence of such local features in 
Section 1. Our data is coming from a mixture distribution of a Uniform and a Triangular 
Density, with equal mixing proportions. Thus, we consider X\, X2, . ■ ■ , X n coming from 
a equal mixture of Unif ( zl f ZL , = £) and Triangular (| , ^) with a peak at |. We take 
n = 1000 and execute the algorithm, starting with the detection of support and outliers. 

We have broken the circle into 2 9 regions and we work with the data on these intervals. 
First the detection of support, gives us almost the exact regions, starting from a little left of 
=1^ to a little right of , and from a little left of \ to a little right of . We then proceed 
to the detection of discontinuities and edges. We get two points of discontinuity and two 
edge points. Breaking the data into the rest of the region, we estimate the smooth region 
using the Fourier Spline technique, and fit the exponential model to estimate the regions 
showing the local features. The final estimated density is shown in Figure 5 along with the 
usual kernel density estimate. The whole procedure is repeated 10 5 to find an estimate of 
the mean integrated squared error. We also calculate the MISE of the estimates of the local 
features, which is tabulated in Table 3. 



Table 3. Mean Integrated Square Errors 



Estimates 


Using Fourier 
Spline 


Using usual 
KDE 


Local Feature 
at -3tt/4 


Local Feature 
at — 7r/4 


MISE 


0.0058 


0.0112 


0.0049 


0.0051 



Now, note that, the support of the distribution has been very well identified in estimate 
through Fourier Splines. Furthermore, all points of discontinuities and edge effects has been 
correctly identified, allowing us to apply the smooth function estimation technique on the 
remaining set. The vertical dotted lines show the points detected as having local features. 
Finally we estimate the local features using the exponential model. The MISE of these local 
estimates are also small as seen from Table 3. Note that such positions of local features 
have not been identified in the estimate obtained from the standard KDE procedure. Thus, 



Using Fourier Splines Using usual KDE Actual Density 




Fig. 5. The Fourier Spline Estimate along with the Kernel Density estimate and the Actual Density. 

the estimate from the Fourier Spline is indeed much more closer to the actual density. This 
claim is further justified from Table 3, where we see that the MISE from the Fourier Spline 
is much smaller than the MISE from the usual KDE. 

5.2 Mixture Density of a Continuous and Discrete Distribution 

Here we consider data from a mixture density of a normal distribution with outliers coming 
from a discrete distribution. We have considered X\,X2, ■ . ■ X n to come from a mixture of a 
normal distribution with a mode at truncated to [—§,§) with probability 1 — e, and with 
probability e from a discrete distribution taking values {— 37r/4, 37r/4} with equal proba- 
bility. We have chosen e = 0.05, n = 1000 for our simulations. Exactly similar procedure 
is carried out to reach the final estimate shown in Figure 6, along with the usual kernel 
density estimates. The whole procedure is repeated N = 10 5 times to get an estimate of 
the mean integrated squared error. We also calculate the MISE of the estimates of the local 
features. Both results are tabulated in Table 4. 



Table 4. Mean Integrated Square Errors 



Estimates 


Using Fourier 
Spline 


Using usual 
KDE 


Local Feature 
at -3tt/4 


Local Feature 
at 3tt/4 


MISE 


0.0041 


0.0145 


0.0009 


0.0010 



As before, the support has been very well detected, which allows us to accurately es- 
timate the smooth function, which is not the case from the KDE as seen from the second 



Using Fourier Spline Using usual KDE Actual Density 




Fig. 6. The Fourier Spline Estimate along with the Kernel Density estimate and the Actual Density. 

figure. Furthermore, the outliers has been almost perfectly detected as can be seen through 
the dotted lines in Figure 6. Those regions have been estimated through the local exponen- 
tial family. Note that the presence of the outliers creates humps in the estimate from the 
KDE, which fails to identify the discrete nature. The very fact, which motivated us to work 
on this problem, and we have been able to successfully create a work about for the problem. 
As before we can see that our estimate through Fourier Spline, has a smaller MISE when 
compared to that from the standard KDE. 



5.3 Density with Several Discontinuities 

We consider another example where there are more than 2 discontinuities in order to check 
whether our methodology is successful in detecting them all or not. So we consider a data 
coming from a density which can be written as 

\l for *€[-§,-!) 
f{x)=l± forx€[-f,f) (60) 
U for *e[f,f) 

This density has four points of discontinuity, viz, — §,— f,f and ^. We draw n = 1000 
observations from this density and perform the procedure as done in the last two examples. 
The final density estimate is given in Figure 7. The whole procedure is repeated for N = 10 5 
times to get the mean integrated square error. MISE for the estimated local features are 
also calculated. Results of which are tabulated in Table 5. 

Here too, the support of the distribution has been perfectly detected. Note further that 
the 3 dotted lines in the first figure of Figure 7, shows that our algorithm has successfully 




Fig. 7. The Fourier Spline Estimate along with the Kernel Density estimate and the Actual Density. 



Table 5. Mean Integrated Square Errors 



Estimates 


Using Fourier 
Spline 


Using usual 
KDE 


Local Feature 
at -tt/2 


Local Feature 
at 7r/4 


Local Feature 
at 7r/2 


MISE 


0.0036 


0.0083 


0.0020 


0.0031 


0.0027 



detected 3 local features, mainly discontinuities at — 7r/2,7r/4 and tt/2. However, it has 
failed to detect the discontinuity at ir/4. This can be explained by the fact that the jump 
discontinuity in modulus is small at — 7r/4 when compared to the other jump discontinuities 
in the density. Thereby, its chances of detection is indeed very small. However, our algorithm 
has been able to detect discontinuities with a moderately high jump. 

Now on when we look at the estimate through the Kernel density estimation, we see 
that it has neither been able to detect the support nor the local features accurates. As a 
results we can see from Table 5 that the MISE of our method, here too, (although it has 
failed to detect a discontinuity) is far better than MISE of the estimate obtained through 
the standard method of Kernel Density Estimation. 

6 Real Life Data 

Here we apply our methodology on a real life data obtained from Sengupta and Rao (1966) 
[24j . They have provided the data coming from Cross-bedding Azimuths in the 3 Units of 
Kamthi River. This data is given in Table 6. We have used the data from Upper Kamthi, to 
show our Methodology. Since we are given grouped data, we have considered in each group 
they follow a uniform distribution, with minor rotation in each bin to get more uniformity. 
The histogram of the data, along with the Fourier Spline and standard KDE estimates are 
shown in Figure 8. 

In Figure 8, the gray line depicts the estimate from the standard KDE, while the black 
depicts the estimate from Fourier Spline. Here too we have been able to almost perfectly 
detect support. The extreme regions which have no data, has been correctly identified. 



Table 6. Frequency Distributions of Cross-bedding Azimuths in the 3 Units of Kamthi River (Sengupta and 
Rao, 1966) 



Azimuth (in degrees) 


Lower Kamthi 


Middle Kamthi 


Upper Kamthi 


- 


19 


14 


50 


75 


20 - 


39 


14 


62 


75 


40 - 


59 


11 


33 


15 


60 - 


79 


13 


9 


25 


80 - 


99 


9 


1 


7 


100 - 


i in 
119 


16 


3 


3 


120 - 


139 


u 


u 


Q 
O 


140 - 


159 


4 








160 - 


179 











180 - 


199 


3 








200 - 


219 


4 


2 


21 


220 - 


239 





8 


8 


240 - 


259 








24 


260 - 


279 





11 


16 


280 - 


299 


6 


5 


36 


300 - 


319 


7 


20 


75 


320 - 


339 


1 


53 


90 


340 - 


359 


21 


41 


107 



Furthermore, few local features such as outliers has also been identified by the procedure 
at 2.09. As seen from the figure, our estimate is much more closer to the actual probability 
distribution than the standard KDE. 



7 Discussions 



7.1 Regarding the Spline Smoothing on Circles. 

Note that although we have got an estimate of our density function by solving the Spline 
Smoothing on Circle, our solution is not the exact solution because, the solution of the 
Spline Smoothing problem must belong to the space of non-negative definite sequence. If our 
solution was indeed non-negative definite, then by Bochner's Theorem, the corresponding 
Kernel would have been non-negative. But as seen from Figure 4 in Section 2, this is not the 
case. Thus, solving the Spline problem on circles with the non-negative definite constraint 
is one of the open problems which should influence researchers to further work in this field. 



7.2 Regarding Estimation of the Local Features 

In this article, we have estimated the local features using the exponential model, and then 
multiplied it with the corresponding function p(x) where p G {pi}iel , a partition of unity, 
created on the set of open covers using the function 

, , I exp (-tan 2 £r) —a < x < a , N 

p(x) = { ^ K 2a! (61) 

I otherwise 

for different choices of a. Let us explain this in a bit more details. Suppose, we have detected 
a local feature in A n j (using the same notation as before), then A n j forms a unit of the 




Fig. 8. Application of Fourier Spline Methodology on Real Data 



open cover. The rest can be chosen so that they form an intersecting set. Now, suppose, pi 
is has its support in A n j. So, in order to estimate the local feature, what we have done is 
we have first estimated the exponential model using the data in A n j and then multiplied 
that with pi. The final estimates is done through proper normalisation. 

Now instead of doing this, another way of estimating the local feature could be estimating 
the whole function pi(x) x g(x), where g{x) a density from the chosen exponential family. 
That is, we can find the optimal values of (/3n, P2) by maximizing the likelihood given by 



Hpo,P1,P2)- II Pi{Xj) x — , , r (oij 

,ti J a exp{/3 x + /3i/(x > x ) + (3 2 {x - x ) + \ 

instead of maximising, 

T(R R R\ TT e x P {Ppxj + Pxljxj > zg) + h{xj - x ) + } 

L{Po,Pi,P2)- II -j. j— r— r (53) 

J A . exp{^ x + p 1 I(x > x ) + P2(a; - x )+| 

This would give us another estimate and we could do a comparative study on these different 
methods to see which method gives the best estimate of the local features. 



8 Conclusion 



In this article, we have developed a new methodology for estimating Circular probability 
distributions, using Spectral Isomorphism to solve an equivalent Spline Smoothing Problem 
on the Circle. 

There has been already a lot of literature which aims at density estimation on the Circle 
however most of them are based on Kernel Density Estimation with different modifications. 
But, working on the problem with the knowledge of presence of local features has hardly 



been attempted. Since during smoothing of the kernel density the local feature gets lost, it 
is very difficult to capture these using the conventional kernel density estimation. 

We develop the methodology based on the usual Spline Smoothing problem, which tries 
to solve the Curve Fitting problem. In our case, because of the form of the Fourier coeffi- 
cient on the circle, the minimisation problem for finding the density in the I? space, gets 
transformed to the Spline Smoothing problem on the Circle. The solution to that problem, 
via Bochner's Theorem can be transformed to a Kernel which has a square integrable second 
derivative. Thus, the density can be obtained through the inverse Fourier transformation. 
We have been able to solve the Spline Smoothing problem on the circle and also derive 
several results on the resultant density estimate. Estimates for the Bias and MSE has been 
calculated. The optimal rate of the density has been proved to match the optimal rate in 
the standard Kernel Density Estimation. Finally our methodology has been compared with 
other standard methods as a comparative study. 

Another major focus of this work has been in the detection and estimation of local 
features such that outliers, discontinuity and edges. The testing problem is different for the 
concerned local feature however, the overall multiple testing has been performed using a 
novel application of the Holm's procedure as well as a novel technique of working with a 
multivariate p-value profile. 

The Support and Outlier detection has been done by considering a B(n,c/n) model, 
while discontinuity and edge detection has been performed by considering an exponential 
family. The final overall estimate is obtained by considering a partition of unity of a special 
function. The local functions has been estimated through exponential model and the smooth 
function is through a weighted adaptation of the Fourier Spline methodology. 

Lastly, we have applied our technique to a few artificial densities and to a real life data as 
well. In all situations, this technique proved to be a much better estimate than the standard 
kernel density estimates. We also have a few open problems for further research, especially, 
the problem of solving the Spline problem on circle, restricted to the non-negative definite 
solution. As well as the comparative study of the different techniques of estimating the local 
features. We are also focusing our attention to finding the estimate of the local functions 
in the Fourier paradigm itself. A few details regarding this and the complete flowchart has 
been discussed in the Appendix. 



Appendix 

Detecting Local Features using the Fourier Paradigm 

Suppose / denotes the density of the distribution. Let P denote the measure on the space. 
Correspondingly P n denotes the empirical cdf. Note that by the Riesz Representation The- 
orem, P can be thought of as an operator working on / such that 

/dP (64) 



Let T = [0, 2tt) = |J A nj where A nj = ^„' J for j = 0, 1, . . . , 2 n - 1 is a partition 

of the circle. Note that 2 n A nj - 2nj = [0,2vr). Now for y £ [0, 1), / ( 27r( 2 J n +y) ) is the local 
information of / on A n j. P n is a measure on [0, 2ir). Let the sample from A n j be denoted 



by S n j and S n = U^t) 1 Snj- Thus 2 n S n j — 2nj maps S n j to [0, 2ir). As n becomes larger 
the information becomes more and more local. 

Define F n j = J2 x eS nj ~ ■)> as the empirical cdf on A nj -. Thus we get, 

F nj (l) = [ dF nj = \S nj \ (65) 



r 



nj 



(/) = E /(*) ( 66 ) 



Denote the conditional empirical cdf on A n j as Q n j = r^rj- Now consider the 2 n -th 
Fourier coefficient, i.e, 



/•27T 

l(z 2- ) = P B(e a-. )= / e ^ dp ^ 
■/ o 

2 n -l , 

c dP„ =J2L e i2 " x dF r , 

j=0 

= E E e^^EVn.lQn,^ 

Thus, 

e j2n;E dP n = E l^-IQniCe") (67) 
Jo 3 =0 

where Q n j(e lx ) is the scattered empirical Fourier coefficients which capture the localisation 
property at A n j. Thus for each element A n j of the partition we have a localisation value as 
Qnj( e ' ix ) if \S n j\ > and otherwise. We can consider this as the new data, as if we have 
only these 2 n data points, and we try to find the density value at these points. As mentioned 
before, we can see that because of equation (58) the localisation property is lost as the 2 n -th 
Fourier coefficient is calculated by a weighted average of these local information. 

Thus, if we can use the information present in each A n j to detect the presence of discon- 
tinuity or edges, through Qnjie 1 *), and estimate the same, then the whole procedure can 
be generalized to the Fourier paradigm. We are still trying to accomplish this and hope to 
succeed in the near future. 



2 n -l . 2 n -l , 2^0+1) 

= E / e* 2 "*' 

j=0 A nj j = Q " ari 

2 n_ 1 2 n - 



j = 2nj < 2n(j + l) j=0 



»2tt 



Algorithm for Overall Density Estimation using Fourier Spline Technique 

Step 1 : Considering the given data on the circle, find an n large enough and break the 
circle into layers of partitions of size 2 k for k = 2, . . . , n. Call the j'-th unit of the A;-th 
layer as A n j. 

Step 2 : Execute the algorithm for Support and Outlier detection as explained in Section 
3.2. Finally perform the Holm's procedure to get the region of the support and the 
location of the outliers. 

Step 3 : Remove the portions which are outside the support, or are outliers. In the re- 
maining portion that perform the Discontinuity and Edge Detection using the algorithm 
as given in Section 3.3. Finally perform the Holm's procedure to get the locations of the 
discontinuities. 



Step 4 : Keep a note of the A^s having a discontinuity. Incorporate them during the 
creation of the open cover of the subset. 

Step 5 : Create the partition of unity, by considering the open cover created in the pre- 
vious step. 

Step 5 : Estimate the local features such as discontinuities and outliers by estimating the 
local exponential family, using the data from their corresponding support. The method 
is described in Section 4.1. 

Step 6 : Estimate the smooth function by the Fourier Spline Procedure, by considering 
the weighted empirical coefficients. Details in Section 2 and Section 4.2. 

Step 7 : Finally the overall estimate is obtained by proper normalisation. The final esti- 
mate is give by equation (50). 
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